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Abstract 

This  work  investigates  relationships  among  the  convergence  rates 
for  the  variable  x,  for  the  multiplier  A  and  for  the  pair  (x,  A)  in  SQP 
methods  for  equality  constrained  optimization.  Key  contributions  are: 
if  the  convergence  in  (x,A)  and  also  in  x  is  q-superlinear,  then  the 
convergence  in  A  is  either  g-superlinear  or  q-sublinear  with  unbounded 
qi  factor,  and  if  the  convergence  in  (x,  A)  is  q-superlinear,  then  the 
convergence  in  x  is  at  least  two-step  q-superlinear.  It  is  noted  that  a 
theorem  of  Fontecilla,  Steihaug  and  Tapia  leads  to  a  characterization 
result  which  is  potentially  more  useful  than  the  Boggs-Tolle-Wang 
characterization.  Finally,  two  different  conditions  that  guarantee  q- 
superlinear  convergence  in  x,  A  and  (x,A)  for  an  SQP  method  are 
derived. 
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1  Introduction 


In  this  work  we  will  be  concerned  with  the  equality  constrained  optimization 
problem 


minimize  f(x ) 

subject  to  h{(x )  =  0,  i  =  1, 2, . . . ,  m 


(1.1) 


where  /,  hi  are  nonlinear  functions  defined  from  IR"  into  IR. 

We  denote  by  h(x)  the  vector  whose  components  are  h{(x),  i  =  1, . . . ,  m. 
The  Lagrangian  function  associated  with  problem  (1.1)  is  the  function 


£(x,  A)  =  f(x)  +  A Th(x) 


(1.2) 


where  A  =  (Aj, . . . ,  Am)7  is  called  the  vector  of  Lagrange  multipliers  or  simply 
the  Lagrange  multiplier.  The  augmented  Lagrangian  function  associated  with 
problem  (1.1)  is  the  function 


L(x,  A;  p)  =  f(x)  +  \Th(x)  +  ^  p  h(x)Th(x)  (p  >  0)  .  (1.3) 

The  algorithm  we  are  interested  in  is  the  successive  quadratic  programming 
(SQP)  Lagrangian  quasi-Newton  method: 

ALGORITHM  (SQP  Method): 

For  k  =  0, 1, . . . ,  until  convergence  do 

•£fc+l  =  %k  +  Sk  (1-4) 

Afc+i  =  A*;  +  AAfc  (1-5) 

Bk+i  =  JB(xk,sk,\k+1,Bk)  (1.6) 

where  Sf.  and  A  A*;  are  the  solution  and  the  multiplier  associated  with  the 

solution  of  the  quadratic  program 


minimize  Vx£(xk,\  k)Ts+\sTBks  .  . 

subject  to  Vh(xk)Ts  +  h(xk)  =  0.  ‘ 

The  matrix  Bk+i  is  interpreted  as  an  approximation  to  V2xl(xk+i,)<k+\)- 
When  the  augmented  Lagrangian  is  substituted  for  the  Lagrangian  in  the 
SQP  method  we  call  the  resulting  algorithm  the  SQP  augmented  Lagrangian 
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quasi-Newton  method.  For  further  details  on  these  various  SQP  formulations 
see  Appendix  A  of  Tapia  (1988). 

We  begin  our  study  with  a  review  of  terminology  concerning  convergence 
rates  for  iterative  methods.  For  the  most  part  we  follow  Chapter  9  of  Ortega 
and  Rheinboldt  (1970).  However,  our  definition  of  r-convergence  is  essentially 
that  of  Dennis  and  Schnabel  (1983),  which  is  known  to  be  equivalent  to  the 
notion  considered  by  Ortega  and  Rheinboldt  (1970). 

Let  {a;*}  C  IR"  be  a  convergent  sequence  with  limit  x»  and  assume  that 
Xk  -f-  x *  for  all  k.  Consider  a  vector  norm  ||  •  ||  on  IRn.  for  p  G  [l,oo)  the 
quantities 


Qp 


V—  Z*+l  “  x* 

hm— - — 

k  \\xk  -  X,  \P 


are  called  the  qp  factors  of  the  sequence  {a:*}  with  respect  to  the  norm  ||  •  ||. 

We  define  the  q- order  of  convergence  of  {x*,}  to  be  inf{p  :  qp  =  oo}.  The 
qi  factor  will  be  of  particular  interest  to  us.  If  qi  <  1,  then  the  convergence  is 
said  to  be  q-linear,  while  if  <71  >  1,  then  the  convergence  is  said  to  q-suhlinear. 
Clearly  the  ideal  situation  is  when  q\  =  0  and  in  this  case  the  convergence 
is  said  to  be  q-superlinear.  The  least  ideal  case  is  when  q\  —  +00.  We  will 
refer  to  this  convergence  as  q-sublinear  with  unbounded  qi  factor. 

Suppose  that  we  have  {&*,}  converging  to  zero  and  such  that  ||xjt  — x*||  <  6*. 
for  all  k.  If  the  sequence  possesses  a  particular  ^-convergence  property, 
then  the  sequence  { x *}  is  said  to  possess  the  corresponding  r-convergence 
property. 

If  for  each  k  the  subsequence  X*,  x*+j,  Xfc+2j, . . .  displays  a  particular  con¬ 
vergence  behavior,  then  we  say  that  the  original  sequence  xi,  x2, . . .  has  this 
j-step  convergence  behavior. 

It  is  of  interest  to  observe  that  r-convergence  properties  are  norm  inde¬ 
pendent;  so  are  the  notions  of  5-order,  5-superlinear  and  5-sublinear  with 
unbounded  51  factor.  However,  the  notions  of  5-linear  and  5-sublinear  are 
norm  dependent. 

Convergence  of  order  2  is  said  to  be  quadratic  and  that  of  order  3  is  said 
to  be  cubic.  Unfortunately  this  standard  terminology  is  such  that  a  5-order 
of  1  does  not  imply  5-linear  convergence. 

The  SQP  Lagrangian  method  for  equality  constrained  optimization  is 
a  part  of  the  optimization  theory  folklore.  It  was  certainly  known  to  re¬ 
searchers  in  the  calculus  of  variations  in  the  early  1900’s.  The  fact  that  the 
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SQP  Lagrangian  Newton  method,  i.e.  Bk  =  V2J(xk,  A*),  is  9-quadratically 
convergent  in  the  pair  (x,  A)  is  also  a  part  of  the  optimization  folklore.  For  a 
proof  see  Tapia  (1977).  The  SQP  Lagrangian  Newton  method  for  problems 
with  inequality  constraints  is  usually  credited  to  Wilson  (1963). 

The  convergence  of  special  SQP  secant  methods  has  been  investigated  by 
many  authors.  The  first  was  probably  Garcia-Palomares  and  Mangasarian 
(1976).  They  posed  an  SQP  secant  method  and  proved  various  r-convergence 
results  in  (x,  A)  under  certain  assumptions.  Han  (1976),  (1977),  Tapia  (1977) 
and  Glad  (1979)  independently  established  local  and  5-superlinear  conver¬ 
gence  in  (x,  A)  for  several  SQP  secant  methods.  Boggs,  Tolle  and  Wang 
(1982)  showed  that  the  convergence  in  x  for  the  SQP,  DFP  and  BFGS  secant 
method  was  g-superlinear  assuming  that  {x^}  converged  ^-linearly.  They 
obtained  a  characterization  of  g-superlinear  convergence  in  x  also  assuming 
that  {x/t}  converged  ^-linearly.  Fontecilla,  Steihaug  and  Tapia  (1987)  es¬ 
tablished  g-superlinear  convergence  in  x  for  the  SQP  PSB,  DFP  and  BFGS 
secant  methods  and  also  derived  the  Boggs-Tolle-Wang  characterization  of 
f/-superlinear  convergence  in  x  as  a  special  case  of  a  more  general  character¬ 
ization.  These  results  did  not  require  the  assumption  that  {x*}  converged 
g-linearly. 

When  the  secant  update  in  question  was  the  DFP  or  the  BFGS  update, 
all  of  the  above  authors  were  forced  to  either  make  the  assumption  that 
the  Hessian  with  respect  to  x  of  the  Lagrangian  at  the  solution  was  posi¬ 
tive  definite  or  work  with  the  SQP  augmented  Lagrangian  method  with  the 
penalty  constant  chosen  sufficiently  large.  Each  one  of  these  two  alternatives 
is  somewhat  undesirable.  The  first  because  the  Hessian  with  respect  to  x 
of  the  Lagrangian  is  not  in  general  positive  definite  at  the  solution,  and  the 
second  because  the  SQP  augmented  Lagrangian  method  is  known  to  be  sen¬ 
sitive  to  the  choice  of  the  penalty  constant  p,  and  adequate  guidelines  for 
this  choice  seem  to  be  impossible  to  develop. 

This  unfortunate  state  of  affairs  motivated  Powell  (1978)  to  propose  an 
ad  hoc  modification  to  the  SQP  Lagrangian  BFGS  secant  method  which 
compensates  for  the  lack  of  positive  definiteness  in  the  Hessian  at  the  solution. 
Assuming  convergence,  Powell  was  able  to  show  that  his  modified  SQP  BFGS 
secant  method  gave  r-superlinear  convergence  in  the  variable  x. 

The  dilemma  described  above,  i.e.  the  lack  of  positive  definiteness  of  the 
Hessian  with  respect  to  x  of  the  Lagrangian  at  the  solution,  motivated  consid¬ 
erable  research  activity  in  formulating  a  BFGS  secant  method  for  problem 
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(1.1)  in  the  framework  of  the  so-called  reduced  Hessian  methods.  In  con¬ 
trast  to  full  Hessian  methods,  the  reduced  Hessian  methods  approximate 
the  Hessian  restricted  to  a  proper  subspace  where  it  is  expected  to  be  posi¬ 
tive  definite.  For  more  on  reduced  Hessian  secant  methods  see  Murray  and 
Wright  (1978),  Gabay  (1982),  Coleman  and  Conn  (1984),  Nocedal  and  Over- 
ton  (1985),  Gurwitz  (1986),  Gilbert  (1987),  and  Byrd  and  Nocedal  (1988). 
The  theoretical  convergence  rate  that  has  been  obtained  for  the  various  re¬ 
duced  Hessian  BFGS  secant  methods  is  two-step  <jr-superlinear  convergence 
in  the  variable  x. 

Fenyes  (1987)  and  Fontecilla  (1988)  propose  full  Hessian  methods  which 
have  some  of  the  flavor  of  the  reduced  Hessian  methods. 

Recently  Tapia  (1988)  proposed  two  new  classes  of  SQP  secant  methods 
for  problem  (1.1).  One  class  consists  of  SQP  Lagrangian  secant  methods 
with  a  modification  in  the  scale  associated  with  the  particular  secant  update 
in  question  to  compensate  for  the  lack  of  positive  definiteness  of  the  Hessian 
with  respect  to  x  of  the  Lagrangian.  The  other  class  consists  of  SQP  struc¬ 
tured  augmented  Lagrangian  secant  methods.  From  an  algorithmic  point  of 
view,  these  methods  possess  the  flavor  of  the  Powell  modified  SQP  BFGS 
secant  method.  However,  Tapia  was  able  to  prove  that  for  both  methods 
the  DFP  and  the  BFGS  versions  of  the  algorithms  are  locally  convergent 
and  give  <?-superlinear  convergence  in  the  pair  (z,  A)  and  also  in  the  variable 
x  without  a  positive  definite  assumption  on  the  Hessian  with  respect  to  x 
of  the  Lagrangian.  Current  research  activity  is  attempting  to  demonstrate 
that  satisfactory  rules  exist  for  choosing  the  parameter  in  these  methods  that 
corresponds  to  the  penalty  parameter  in  the  Hessian  with  respect  to  x  of  the 
augmented  Lagrangian. 

To  our  knowledge  there  are  no  results  in  the  literature  that  concern  the 
^-convergence  rate  of  the  multiplier  in  an  SQP  method.  However,  several 
recent  articles  in  the  literature  made  various  assumptions  concerning  this 
rate.  For  example,  Boggs  and  Tolle  (1985)  considered  the  SQP  Lagrangian 
BFGS  and  DFP  secant  methods.  They  showed  that  the  convergence  in  x 
was  ^r-superlinear  assuming  that  the  convergence  in  x  and  A  was  ^-linear  and 
{a; A,}  satisfied  a  condition  called  tangential  convergence.  This  result  did  not 
require  the  Hessian  of  the  Lagrangian  with  respect  to  x  to  be  positive  definite 
at  the  solution.  Gill,  Murray,  Saunders  and  Wright  (1986)  proposed  an  SQP 
Lagrangian  secant  method  for  generating  a  search  direction  and  determined 
the  step  length  from  a  line-search  strategy  with  an  augmented  Lagrangian 
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as  the  merit  function.  They  established,  under  the  assumptions  that  the 
convergence  in  x  and  in  A  were  q-superlinear  and  jj^J  >  M  >  0  for  k 
sufficiently  large,  that  eventually  their  steplength  choice  would  be  1. 

Recently  Tapia  and  Whitley  (1988)  investigated  the  convergence  rate  of 
the  projected  Newton  method  applied  to  the  symmetric  eigenvalue  problem. 
This  algorithm  can  be  viewed  as  the  SQP  Newton  method  followed  by  a 
two-norm  normalization.  They  established  a  §-rate  of  convergence  of  1  -f  \/2 
for  both  the  variable  x  and  the  multiplier  A. 

The  result  of  Tapia  and  Whitley  (1988)  further  motivated  and  strength¬ 
ened  our  already  strong  desire  to  determine  relationships  among  the  conver¬ 
gence  rates  for  (x,  A),  for  x  and  for  A  in  SQP  methods. 

Section  2  deals  with  preliminaries  including  our  notation  and  assump¬ 
tions.  In  Section  3  we  collect  three  useful  characterization  theorems.  The 
first  result  is  a  straightforward  application  of  the  well-known  Dennis-More 
characterization.  We  derive  it  in  a  very  convenient  form  which  readily  lends 
itself  to  applications.  The  second  is  the  Boggs-Tolle-Wang  characterization 
and  the  third  is  a  useful  characterization  theory  which  follows  directly  by 
restricting  a  theorem  of  Fontecilla,  Steihaug  and  Tapia  to  the  case  of  SQP. 
We  maintain  that  collectively  these  three  theorems  offer  a  powerful  tool  and 
demonstrate  this  fact  by  using  them  to  derive  some  interesting  consequences. 
Two  conditions  which  guarantee  the  <?-superlinear  convergence  of  the  multi¬ 
plier  sequence  {A*.}  are  included.  However,  we  wish  to  emphasize  that  the 
main  theme  of  this  work  is  that  while  the  g-superlinear  convergence  of  { A*,} 
is  likely  in  any  given  computation,  mathematical  conditions  which  assume  it 
or  guarantee  it  are  necessarily  restrictive. 

In  Section  4  we  show  that  the  convergence  in  x  is  at  least  two-step  q- 
superlinear  whenever  the  convergence  in  (x,  A)  is  ^-superlinear.  In  Section 
5  we  show  that  if  the  convergence  is  (a:,  A)  and  x  is  <?-superlinear,  then  the 
convergence  in  A  is  either  ^-superlinear  or  ^-sublinear  with  unbounded  qx 
factor.  We  consider  these  two  theorems  to  be  key  contributions  of  the  paper. 
In  Section  6  we  summarize  and  present  some  concluding  remarks. 
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2  Preliminaries 


In  an  effort  to  simplify  our  notation  and  give  a  cleaner  presentation  we  will 
work  exclusively  with  the  SQP  Lagrangian  formulation  in  the  remainder  of 
this  work,  i.e.,  we  will  effectively  choose  p  —  0  in  the  augmented  Lagrangian. 
No  loss  of  generality  will  result  from  this  simplification  as  long  as  we  remem¬ 
ber  that  the  requirement  that  the  Hessian  of  the  Lagrangian  with  respect  to 
x  be  positive  definite  at  the  solution,  can  be  dealt  with  by  working  with  the 
augmented  Lagrangian  and  choosing  p  sufficiently  large. 

Let  x*  be  a  local  solution  of  Problem  (1.1)  with  associated  multiplier  A». 
We  will  use  the  notation  Vhk  =  Vh(xk),  Vfk  =  V/(xfc),  V/i»  =  V/i(x.)  and 
A*  =  V^(x*,  A*).  Both  the  1 2  vector-norm  and  the  corresponding  induced 
matrix  norm  will  be  denoted  by  |  •  |.  We  will  use  ||  •  ||  to  denote  an  arbitrary 
but  fixed  matrix  norm. 

Throughout  this  work  we  make  the  following  assumptions: 

Al.  The  functions  /  and  h  have  continuous  second  derivatives  in  an 
open  neighborhood  D  of  a  local  solution  x*  of  problem  (1.1)  and 
these  second  derivatives  are  Lipschitz  continuous  at  x*. 

A2.  V/i*  has  full  rank. 

A3.  zT A*z  >  0  for  all  z  /  0  satisfying  Vh^z  =  0. 

A4.  For  large  k  the  sequence  {(x*,,  A^)}  has  been  generated  by  a  partic¬ 
ular  SQP  quasi- Newton  method  with  invertible  Bk.  Also  {(x^,  A*,)} 
converges  to  (x»,A„). 

Assumptions  Al,  A2,  and  A3  are  standard  assumptions  in  the  study  of 
quasi-Newton  methods  for  constrained  optimization.  Assumption  A3  is  the 
well-known  second-order  sufficiency  condition. 

Since  we  are  only  concerned  with  convergence  rates,  no  generality  will  be 
lost  and  considerable  simplicity  will  be  gained  by  assuming  that  assumption 
A4  holds  for  all  k  and  from  assumptions  Al,  A2,  and  A4  that  Vhk  has  full 
rank  for  all  k. 

The  requirement  that  Bk  be  invertible  is  mild  and  is  effectively  implied 
by  second-order  sufficiency  for  the  subproblem  (1.7).  To  see  this  observe 
that  if  Bk  is  positive  definite  on  {?;  :  =  0},  then  for  sufficiently  large  p 
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the  matrix  Bk  =  Bk  +  pVhkVhk  is  positive  definite  and  therefore  invertible. 
Moreover,  the  subproblem  (1.7)  has  the  same  solution  using  Bk  or  Bk.  For 
more  detail  see  Tapia  (1977). 

Some  of  our  results  will  also  require  that  {Bk}  and  { B jT1}  be  bounded. 
The  requirement  that  {Bk}  and  { B jT1}  be  bounded  is  quite  mild.  In  general 
if  the  quasi-Newton  update  satisfies  bounded  deterioration,  then  we  have 
that  {Bk}  and  {Bf1}  are  bounded  if  the  initial  (x0,B0)  is  close  to  (a;*,  A*). 
Moreover,  the  well-known  secant  updates  Broyden,  PSB,  DFP  and  BFGS 
all  satisfy  bounded  deterioration.  For  more  details  see  the  argument  used 
in  Broyden,  Dennis  and  More  (1973)  and  Theorem  3.1  and  3.2  in  Fontecilla, 
Steihaug  and  Tapia  (1987). 

In  our  study  we  will  have  need  to  refer  to  the  SQP  quasi-Newton  method 
in  its  equivalent  “diagonalized  multiplier  method”  form 

Afe+1  =  (VhTkB^\/hk)-\hk  -  VhlB^Vfk)  (2.1) 

and 

xk+ 1  xk  Bfc  )  .  (2.2) 

For  details  on  this  equivalence  see  Tapia  (1977),  (1978)  or  Fontecilla,  Steihaug 
and  Tapia  (1987). 

In  many  of  our  results  we  will  need  to  relate  the  quantity  Afc+1  -  A,  to 
the  quantities  xk+i  —  x«,  and  xk  —  x*.  The  following  lemmas  are  technical 
results  which  accomplish  this  objective  and  will  be  useful  tools  in  the  proof 
of  several  of  our  results. 

Lemma  2.1  There  exists  a  sequence  of  matrices  {r^}  such  that  {T*,}  con¬ 
verges  to  A*  and 


xk+ 1  ~  x*  +  Bk  1  Vhfc( Xk+1  —  A*)  =  (I  —  Bk  1r^)(xjt  —  x*)  . 

Proof.  From  (2.2)  we  can  write 

xk+i  —  %*  —  xk  —  x*  —  Bk  1[Vx£(x*,  Afc+1)  —  Vx£(x*,  A*)] 

=  xk~x*~  Bfl[Vfk  +  Vhk\k+1  -  V/»  -  Vh» A,] 

=  B^[Bk(xk  -  x»)  -  (V/fc  -  V/»)  -  Vhk(\k+1  -  A.) 
-  (Vhk  -  V/t„)A,j 


7 


-  Bk' ^Bk(xk  -  x*)  -  ^  V2/(x, -K(x*  -  x*))df  ( xk-x ,) 

—  Jo  V2h(x*  +  t(xk  —  x,))A,dt  (a:*:  —  x*)j 
—  Bk  1  V/ifc(Afc+i  —  A,)  , 

where  the  integral  of  the  matrix-valued  function  is  interpreted  component¬ 
wise.  For  more  details  see  Chapter  4  of  Dennis  and  Schnabel  (1983).  Let 
Ffc  —  fo  V2/(x*  +  t(xk  —  x*))dt  -f  Jq  V2h(x *  +  t(xk  —  x^X^dt.  Then  we  have 

xk+i  x *  Bk  [Bk(xk  x»)  F^x^  x»)]  Bk  V/i^( Ajt-(-i  —  A,)  , 

or 

xk+i  —  x»  +  Bk  1'Vhk(Xk+i  —  A»)  =  (I  —  B^T^ixk  —  x*)  • 

By  the  definition  of  and  the  fact  that  {(x^,  A^)}  converges  to  (a:*,  A*)  we 
have  { }  converges  to  Am.  □ 

Lemma  2.2  We  have 

A*«  -  A.  =  (VhTtB;lVht)-'[VhTk(I  -  b;'t„ )(xk  -  *.)  +  0(1**  -  I.l2)]  , 
where  {I\}  is  as  in  Lemma  2.1. 

Proof.  From  (2.1)  we  can  write 

Afc+1  =  Xk  +  (Vh'kBk1Vhk)~1[hk  —  V  h%  X£(xk,  Xk)\ 

=  X  k  +  (yhTkBflVhk)-l{hk  -K-  VhlB^lVJixk,  Afc) 

-  Vx£(x*,  A*)]}  . 

Define  by  the  same  formula  used  in  Lemma  2.1  and  perform  the  same 
algebra  to  obtain 

Afc+i  =  Xk  +  (Vh[Bfl  Vhfc)-1  |  Vh(x*  +  t(xk  -  x*))Tdt  (xk  -  a:*) 

—  \7h^Bk  1r*:(a:/;  —  x*)  j-  —  (A&  —  A*). 

Hence 

Afc+!  ~  A»  =  (VhlBflVhkYl[Vhl(I  -  5^1rfc)(xfc  -  x.)  +  0(|xfc  -  x*|2)]  . 
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□ 

Consider  ak  >  0  and  b k  >  0.  We  write  ak  =  0(bk)  if  there  exists  m  such 
that  —  m  f°r  k-  We  now  use  Lemma  2.2  to  establish  that  |Afc+i  —  A*|  = 
0(jxk  -  x»|)  whenever  { Bk }  and  { B jT1}  are  bounded. 


Theorem  2.1  If  {Bk}  and  { B j.1}  are  bounded  then  |Afc+1  —  A*|  =  0(\xk  — 

z»D- 


Proof.  From  Lemma  2.2  we  have 

Xk+i  -  A,  =  {VhlBj;1Vhk)~1['Vhl(I  -  B^T^Xk  -  x»)  +  0{\xk 
From  Lemma  3.7  in  Fontecilla  (1988)  we  have 


x * 


a)]- 

(2.3) 


\{VhTkB^hk)-l\  <  \(VhTkVhk)-'VhTk\*\Bk 


Since  { Bk }  is  bounded  and  VA»  has  full  rank,  it  follows  that  there  exists  Cx 
such  that 

|(V^B4-'V/.t)-1|<C1.  (2.4) 

Furthermore 


\(VhJ(I  -  Bt->rt)|  =  -  r4)| 

<  \VhTk\ \B„  -  r4|  . 

The  facts  that  {Bk  }  is  bounded  and  {T*}  converges  to  A.  lead  to 

\VhTt(I-B;'rt)\<C2  (2.5) 

for  some  constant  Cf. 

Combining  (2.3),  (2.4)  and  (2.5)  we  have  -■ <  C3  for  some 

\xk-x *| 

constant  C3.  Therefore  |Afc+i  -  A*|  =  0{\xk  -  x*|).  □ 

Using  Theorem  2.1  we  can  establish  the  finiteness  of  the  q\  factor  of  the 
sequence  {x^}. 

Corollary  2.1  If  {Bk}  and  {B^1}  are  bounded ,  then 

kfc+l  -  Z»|  =  0(\xk  -  x»|)  . 
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Proof.  By  Lemma  2.1  we  have 


xk+i  x*  ( I  Bk  rfc)(a:fc  —  x *)  —  ~Bk  1^'7^*(Afc+i  ~  A*)  . 

From  Theorem  2.1  and  the  fact  that  {Bfl }  is  bounded  we  have 
jVVfr/c(Afc+1  -  Ajj  <  |V||v,t|IV1-A.| 


\Xk  ~  x* 


\xk  X) 


<  Ci 

for  some  constant  C\.  Furthermore 

|(/-^-1rfc)(x,-x.)|  _  IBf'iBk-TMxk-xJl 


\x k  ^*| 

By  assumption  {Bf 1}  is  bounded,  hence 


l^-fc  xr  | 


!(/  -  -Bt-‘rt)(xt  -  x,)| 


<  c2 


(2.6) 


(2.1) 


(2.8) 


\xk  x *  | 

for  some  constant  C2.  Combining  (2.6),  (2.7),  and  (2.8)  we  have 

|zjt+1  -  x*|  =  0( \xk  -  x»|)  . 

The  following  result  holds  for  any  sequences  {a^}  and  {A/t}  no  matter 
whether  they  were  generated  by  an  SQP  quasi-Newton  method  or  not. 


□ 


Proposition  2.1  If  {xk}  converges  to  x*  q-superlinearly  and  { A* }  converges 
to  A,  q-superlinearly,  then  {(xfc,Ajt)}  converges  to  (x*,A*)  q-superlinearly. 

Proof.  Since  ^-superlinear  convergence  is  independent  of  norm,  we  can  work 
with  the  max  norm. 

Let  ||  •  ||  denote  the  max  norm.  By  assumption  there  exist  {ck}  and  {ck} 
such  that 


and 


ll^fc+i  -  ®*||  <  ck\\xk  -  s*|| 
II Afc-i-i  —  A* ||  <  ck\\Xk  —  A* || 
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where  {c^}  and  {c*}  converge  to  0.  Then 

||(^fc+i)  Afc+i)  —  (a:,,  A»)||  =  IK^fc+i  —  £*,  Afc+i  —  A»)|| 

=  max{||xjb+i -x,||,  ||A*+1-A,||} 

<  max{c*||x*  -  x*||,  cfc||A*-A.||} 

<  (max{cfc,cfc})  ||(xfc -x*,A*  -  A*)||  . 

It  follows  that  {(:£fc,Ai)}  converges  to  (x»,A»)  g-superlinearly.  □ 

3  Fundamental  Characterization  Theorems 
and  Consequences 

In  this  section  we  present  three  important  characterization  theorems  and  then 
derive  several  consequences  of  these  theorems.  One  of  these  theorems,  the 
Boggs-Tolle-Wang  characterization,  is  somewhat  well-known.  Another  one  of 
these  theorems  results  from  a  straightforward  application  of  the  well-known 
Dennis-More  characterization.  We  state  it  in  a  form  which  is  particularly 
convenient.  The  remainder  of  the  results  presented  seem  to  be  unknown. 

It  is  straightforward  that  the  SQP  method  can  be  viewed  as  a  quasi- 
Newton  method  applied  to  the  nonlinear  equations  which  represent  the  first- 
order  necessary  conditions  for  problem  (1.1);  see  Tapia  (1977),  (1978)  for 
details.  Applications  of  the  Dennis-More  characterization  of  g-superlinear 
convergence  for  quasi-Newton  methods  have  been  used  in  this  context  by 
numerous  authors.  We  now  use  the  Dennis-More  characterization  to  derive 
the  following  characterization. 

We  write  Ax*  for  Xk+i  —  Xk  and  similarly  for  AA*,. 

Theorem  3.1  For  an  SQP  method  the  following  two  statements  are  equiva¬ 
lent: 


(i)  {(x*,  A*)}  converges  q-superlinearly  to  (x*,  A*). 
I (Bk  -  A»)Axfc| 


00  Jim 

K— KX> 


I  Axfc| 


min(1-|^})}  =  0 
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Proof.  To  start  with  observe  that  both  properties  (i)  and  (ii)  are  norm  in¬ 
dependent.  Hence  we  may  conveniently  work  with  the  max  norm,  which  we 
denote  by  ||  •  ||. 

According  to  the  Dennis- More  superlinear  convergence  criterion  (see  Dennis- 
More  (1974))  we  have  ^-superlinear  convergence  of  the  sequence  {(xfc,  A*)}  if 
and  only  if 


K  \\(Bk  -  A*)Axk  +  (Vhk  -  VK)AXk\\ 

II  ( An:*;,  AAjt)|| 


(3.1) 


and 


\\(Vhk-Vh.)TAxk\\  _  n 

||(A**,AA*)||  ~ 

See  equations  ( 129)— ( 131 )  of  Tapia  (1977)  for  further  details.  Clearly 
holds  and  (3.1)  is  equivalent  to 


(3.2) 

(3.2) 


lim 

A:— kx> 


ll(Bt  -  A)Ast|| 
ll(Ait,  AAfc)|| 


=  0. 


(3.3) 


Dividing  (3.3)  by  ||Axfc||  we  see  that  (3.3)  is  equivalent  to  condition  (ii).  The 
Dennis-More  characterization  requires  V2£(x„,A*)  to  be  nonsingular.  This 
follows  from  A2  and  A3.  □ 

The  following  is  the  well-known  Boggs- Tolle- Wang  characterization.  For 
a  self-contained  and  short  proof  see  Stoer  and  Tapia  (1987). 

Let 

Pk  =  I-  ^hk{WhTkVhk)-lVhTk  .  (3.4) 


Theorem  3.2  (Boggs-Tolle-Wang).  For  an  SQP  method  the  following  two 
statements  are  equivalent: 


(i)  {x^}  converges  q-superlinearly  to  x*. 

(ii)  lim  ~  A*)Aa:fcl  =  0. 

fc-oo  |Ax*| 

The  SQP  Broyden,  PSB,  DFP  and  BFGS  secant  methods  are  known  to 
satisfy  the  condition 


lim 

k— i-oo 


\(Bk-A*)Axk\ 

|Axfc| 


=  0, 


(3.5) 
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hence  by  Theorem  3.1  and  Theorem  3.2  they  give  g-superlinear  convergence 
in  x  and  in  (a:,  A).  For  details  see  Corollary  5.5  in  Fontecilla,  Steihaug  and 
Tapia  (1987). 

These  comments  bring  to  the  foreground  the  following  common  concern. 
Many  researchers  in  the  area  find  it  unsettling  that  in  every  known  situation 
where  the  g-super  linear  convergence  of  x  has  been  established  for  an  SQP 
method,  it  was  established  by  first  demonstrating  that  (3.5)  holds.  Clearly 
(3.5)  implies  the  Boggs-Tolle-Wang  condition  ((ii)  of  Theorem  3.2).  This 
means  that  we  are  sacrificing  information  by  using  the  Boggs-Tolle-Wang 
characterization,  since  we  must  have  more  than  just  ^-superlinear  conver¬ 
gence  in  x.  Of  course,  the  pertinent  issue  here  is  a  characterization  of  condi¬ 
tion  (3.5)  in  terms  of  the  convergence  aspects  of  the  SQP  method  in  question. 
We  find  it  interesting  that  such  a  characterization  has  not  been  identified  and 
yet  it  can  be  obtained  as  a  straightforward  consequence  of  Theorem  5.1  of 
fontecilla,  Steihaug  and  Tapia  (1987).  Specifically,  by  restricting  their  the¬ 
orem  to  the  case  of  SQP  and  making  some  rather  obvious  observations  we 
obtain  the  following  result. 


Theorem  3.3  ( Fontecilla-Steihaug-  Tapia) .  For  an  SQP  method  the  follow¬ 
ing  two  statements  are  equivalent: 


0) 


lim  m 

k—*oo  |Axfc| 


=  0. 


(ii)  (a)  {xj;}  converges  q-superlinearly  to  x , 
(b)  lim  |Afc+1  ~  A;[  =  0. 

k— >oo 


and 


Jxfc  X* 


We  now  present  several  consequences  of  Theorem  3.1  and  Theorem  3.3. 
Before  we  isolated  these  two  theorems  we  had  derived  these  results  using 
only  the  Dennis-More  characterization.  This  was  a  lengthy  task.  Hence 
these  consequences  also  can  be  viewed  as  a  demonstration  that  collectively 
these  theorems  offer  a  powerful  analytical  tool. 


Proposition  3.1  For  an  SQP  method  the  statement 


0) 


lim  \—k 

fc-+oo  |Axfc| 


=  0 
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or  the  statement 


(ii)  (a)  {Bk}  is  bounded  and 

(b) 


lim  IXT7  =  0 
oo  I  AAfc| 


implies  the  statement 


(iii)  {(3;*,^*)}  converges  q-superlinearly  to  (a:*,  A*). 

Proof.  The  proof  follows  directly  from  Theorem  3.1.  □ 

We  now  extend  the  Boggs- Tolle- Wang  characterization  of  9-superlinear 
convergence  of  {a;^}  to  the  common  case  where  it  is  known  that  the  pair 
(x,  A)  converges  <?-superlinearly. 


Proposition  3.2  For  an  SQP  method  assume  that  {(a^A*)}  converges  q- 
superlinearly  to  (a:*,  A,).  Then  the  following  two  statements  are  equivalent: 

(i)  {a;*,}  converges  q-superlinearly  to  x». 

(ii) 

lim  =  0  (3.6) 

|Aa:nJ  v  ' 

whenever  {n^}  is  a  sequence  of  positive  integers  such  that 


lim 

k— KX) 


1 

|AAnJ 


=  0  . 


(3.7) 


Proof.  By  Theorem  3.2  (i)  implies  (ii).  Now  suppose  that 


=  s  >  Q 

k^oo  |Aa;fc| 


Then  for  some  sequence  of  positive  integer  {n.*} 

lim  \3aSBnk  -  A*)^xnk  I  =  s  >  0 
k^oo  |AxnJ 


It  follows  that 


lim  A^Ax 

|Ax. 


lim  1  =  6'  >  0  . 


(3.8) 


nk  I 
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(3.9) 


Choosing  a  subsequence  of  {n*},  and  again  calling  it  {ru},  we  have 

\(Bnk  -  AT) Axnk\ 


lim 

k—*OQ 


I  ^xnk  | 


=  8'  >  0  . 


From  Theorem  3.1  we  see  that  (3.9)  implies  (3.7);  hence  (3.6).  However  (3.6) 
and  (3.9)  are  incompatible.  It  follows  that  our  supposition  (3.8)  cannot  hold 
and  (ii)  of  Theorem  3.2  must  hold.  □ 

Observe  that  (ii)  vacuously  holds  if 


I  As*  | 

|AA*| 


>  m  >  0  for  all  k  . 


(3.10) 


Hence  (3.10)  and  the  g-superlinear  convergence  of  {(a:*.,  A^)}  imply  the  q- 
superlinear  convergence  of  {a;*;}. 


Proposition  3.3  For  an  SQP  method  assume  the  condition 

<£* 


IV  -  A 

Then  the  statement 

I (Bk  -  A,)Axk\ 


<  M  <  Too  for  all  k  . 


(3.11) 


(i)  lim 

k—¥  oo 


|Axfej 


=  0  implies  the  statement 


(ii)  (a)  {xfc}  converges  q-superlinearly  to  x »  and 
(b)  {Afc}  converges  q-superlinearly  to  A*. 


Conversely,  if  instead  of  (3.11)  we  assume 

| %k  ~  ®*| 


0  <  m  < 


|A *  -  A, 


for  all  k 


(3.12) 


then  the  statement  (ii)  implies  the  statement  (i).  Consequently  assuming 
(3.11)  and  (3.12)  we  have  that  (i)  is  equivalent  to  (ii). 


Proof.  Consider  the  expression 

|A&-{-i  A* |  | A/p-j-x  A* |  \xk  x * 


|A*-A. 


kfc-z*|  |Afc  -  A, 


(3.13) 
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If  (3.11)  holds,  then  Theorem  3.3,  statement  (i)  and  (3.13)  imply  (ii).  Now, 
if  (3.12)  holds,  then  Theorem  3.3,  statement  (ii)  and  (3.13)  imply  (i)  holds. 

□ 

We  believe  that  in  most  cases  (i)  and  (ii)  will  be  equivalent. 

The  following  result  does  not  use  Theorems  3.1 — 3.3. 


Proposition  3.4  For  an  SQP  method  the  statement 


(i)  (a) 

(b) 


{Bk}  and  {Bk1}  are  bounded  and 
\xk  X * 


lim 

k->°°  |Afc 


A,  I 


0 


(3.14) 


implies  the  statement 

(ii)  (a)  {a;*,}  converges  q-superlinearly  to  x*  and 
(b)  {Afc}  converges  q-superlinearly  to  A*. 


Proof.  From  Theorem  2.1  we  see  that  (i)(a)  implies  that  jAjk+1  —  A*|/|ar*  —  ar*| 
is  bounded  uniformly  in  k.  Hence  (i)(b)  and  (3.13)  imply  the  ^-superlinear 
convergence  of  {A k}. 


Moreover,  the  fact  that  |A*+i-A„|  =  0{ \xk -x, |)  implies  that 


O 


o 


-  xt 


So 


lim 

k—>oc 


I Xk  -  x*| 


=  0  . 


|Afc+i  —  A « 


This  proves  (ii).  □ 

Observe  that  the  conditions  which  allow  us  to  establish  </-superlinear 
convergence  of  {\k},  i.e.  (3.11)  and  (3.14)  preclude  Xk  =  A,  an  infinite 
number  of  times.  Indeed,  they  do  not  allow  {A^}  to  converge  too  fast  relative 
to  {xfc}.  In  Section  5  we  will  argue  that  while  this  restriction  may  hold  most 
of  the  time  it  is  not  mathematically  realistic. 

From  Theorem  3.1  we  see  that  the  assumption 


0  <  m  <  |  for  all  k 

~  |AAfc| 


(3.15) 
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is  sufficiently  strong  to  make  the  ^-superlinear  convergence  of  {(xk,  A^)} 
equivalent  to  the  condition 


\Bk  -  AQAa;* 

hrn  - — — - — 

k—*oo  |Aa)fc| 


=  0  . 


(3.16) 


Hence,  it  must  be  considered  somewhat  restrictive.  We  also  believe  that  the 
condition 


lim  — - — 

k->oo  |Afc  —  A, 


=  0 


(3.17) 


used  in  Proposition  3.4  is  restrictive.  It  is  interesting  that  (3.15)  and  (3.17) 
are  incompatible.  Specifically,  if  (3.17)  holds  and  {Bk}  and  { B ^1}  are 
bounded,  then  from  Proposition  3.4  we  have  that  {a;*}  and  {A*,}  converge 
g-superlinearly.  Hence 


lim  =  l 

k-*o o  \xk  —  a;,  I 


(3.18) 


and  similarly  for  {A^}  (see  Lemma  8.2.3  of  Dennis  and  Schnabel  (1983)).  It 
follows  that  (3.17)  implies 


and  (3.15)  cannot  hold. 


lim 

k— KX> 


[Axfcl 

|AA*| 


=  0 


(3.19) 


4  Implications  of  (x,  A)  on  x 

Han  (1976),  (1977),  Tapia  (1977)  and  Glad  (1978)  independently  established 
local  and  ^-superlinear  convergence  for  the  pair  (x,  A)  for  various  SQP  secant 
methods  as  mentioned  in  Section  1.  In  general,  ^-superlinear  convergence 
for  the  pair  (a:,  A)  only  implies  r-superlinear  convergence  for  x  (or  for  A). 
However,  we  will  now  show  that  for  the  SQP  quasi-Newton  method  the  q- 
superlinear  convergence  of  the  pair  (a;,  A)  always  implies  at  least  two-step 
<?-superlinear  convergence  for  x  provided  that  {Bk}  and  { Bk  x}  are  bounded. 
The  result  will  follow  directly  from  the  following  lemma. 


Lemma  4.1  Let  ak  >  0,  bk  >  0,  ak+1  =  0(ak)  and  bk+1  =  0(ak).  If 
{(^fci^fc)}  converges  to  0  q-superlinearly,  then  the  convergence  of  {ak}  to  0  is 
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at  least  two-step  q-superlinear.  Moreover 


lim  h±I  = 


Proof.  Let  bk+i  =  Mkak.  By  assumption 


lim  . 

k— >oo  \ 


al+ 1  +  fob+1 
“1  +  % 


=  0  . 


Hence 


Since  ak  —  0(ak-i )  and  {Mk}  is  bounded,  it  follows  from  (4.1)  that  lim  — — 

fc-°°  ak- i 

0  and  lim  -fe+1  =0.  □ 

k-*oo  ak~\ 


Theorem  4.1  For  an  SQP  method  the  statement 

(i)  (a)  { Bk }  and  {Bf1}  are  bounded  and 

(b)  {(cc^A*;)}  converges  q-superlinearly  to  (a:*,  A*) 

implies  the  statement 

(ii)  (a)  {a;*,}  converges  to  x *  at  least  two-step  q-superlinearly  and 

(b)  lim  j^-+1  -  A<|  =  0. 

-  xm\ 

Proof.  Let  ak  =  \xk  —  x*|  and  bk  —  j A*  —  A*|.  From  Theorem  2.1  and 
Corollary  2.1  we  have 


bk+1  =  O  ( ak )  and  ak+1  =  O  ( ak ). 
Therefore  from  Lemma  4.1  we  have  (ii). 


□ 
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5  Implications  of  (x,  A)  and  £  on  A 

In  this  section  we  establish  what  we  consider  to  be  our  main  result.  We 
will  show  that  if  we  have  ^-superlinear  convergence  in  the  pair  (x,  A)  and  in 
the  variable  x,  then  we  have  9-superlinear  or  9-sublinear  with  unbounded  q1 
factor  convergence  in  the  multiplier  A.  The  following  lemma  is  a  technical 
result  which  will  be  used  later. 


Lemma  5.1  Let  and  {Anjc}  be  subsequences  of  {x fc}  and  {A*}.  Assume 
W1}  M  bounded.  If 


lim  - H  =  0  , 


k^°°  \Xnk  ~  X 


then 


lim  K7  ~  ^-lrnfc-QCXnt-l  -  g»)|  =  x 

k-*  00  I; 

where  T*  is  defined  in  Lemma  2.1. 

Proof.  By  Lemma  2.1  we  have 


Xnk  X »  (/  —  X*)  __  D_j  ^  ^  (A„fc  —  A») 

"  Uk~l 


(5.1) 

I A  —  A*  | 

By  assumption  lim  — ~ — r  =  0  and  {Bf1}  is  bounded.  Therefore  we 

k-* 00  \x„.  —  xJ 

have 


fc-*°°  l^n*  -  x , 
3-1 


\Bnk-\^hnk-.i{\nk  A»)|  _  p 

k^°°  \Xnk~X«\ 


It  follows  from  (5.1)  that 


lim 

k—>oo 


\X 


nk 


x*  -  {I  -  B'Tn^ix 


Cnk-1 


=  0 . 


Therefore 


lim 

k—>  00 


l(7  ^nfc1-li1"fc-l)(:1'"*-l 

X *  I 


□ 
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Lemma  5.2  Let  QkVk  —  Wk  where  {Qk}  is  a  sequence  of  n  x  m  matrices 
with  full  column  rank  and  {u*,}  is  a  sequence  of  vectors.  Assume  Vk  ^  0  for 
k  sufficiently  large  and  {Qk}  is  bounded  and  the  limit  points  of  {Qfc}  have 
full  column  rank.  If  converges  to  0  q-superlinearly  then  {t>*.}  converges 
to  0  q-superlinearly. 

Proof.  By  the  assumption  that  {Qk}  is  bounded  and  each  {Qfc}  is  of  full 
column  rank  and  the  limit  points  of  {Qk}  have  full  column  rank  we  have 

m\vk\  <  |tOjt|  <  M\vk\ 

for  some  positive  constants  m  and  M  independent  of  k.  Therefore  the  fact 
that  {wk}  converges  to  0  <?-superlinearly  will  imply  {ujt}  converges  to  0  q- 
superlinearly.  □ 


Theorem  5.1  If  {{xk,  XQ}  and{xk}  converge  to  (x*,A»)  and  x*  q-superlinearly 
and  {Bk}  and  {Bf1}  are  bounded ,  then  either 


A. 
or 


(i.e.  {A*,}  converges  q-superlinearly) 


0) 


lim 

k—*  oo 


|Afc+ 


|A* 


]Afc4-i  A* 


00  Jim  - 

fc-oo  |Afc  -  A*  I 
unbounded  qx  factor). 


oo  (i.e.  {Afc}  converges  q-sublinearly  with 


Moreover,  in  both  cases  lim  =  0. 

k~* 00  \Xk- 1  -  x*| 

Proof.  If  |Afc+i  -  A. |  =  0(| Xk  -  A„|)  is  not  true,  then 


ifa 1  ,;+i " A-' 

fc— oo  I  Ajt  —  A,  I 


=  oo  , 


and  case  (ii)  of  the  theorem  holds. 

Now  suppose  that  |Afc+1  —  A*|  =  0(|A^  —  A,).  By  hypothesis  we  have 


|A*+i  A* |  ^  x»,  A k+\  A*) | 

S  ck\{xk  x*,  Xk  A*)| 

—  ^k ( ) x /j  x*|  | A*  A*|)  , 
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where  { Ck }  converges  to  0.  It  follows  that 


[Afc+i  —  A«| 
|A*-A.| 


5:  C/t  [  1  + 


x* 

\h  ~  A, 


Suppose 


Tklhiih  =  s>o. 


k~*°°  |  A*  —  A* 

Then  there  exists  a  sequence  of  positive  integers  {«*.}  such  that 

|An*  +  l  —  A* 


lim 


k^°°  |Anfc  -  A* 


=  <5. 


Case  (i)  If 


i-r  > **1 


*->0°  |A„fc  -  A 

then  from  (5.2)  we  have 


<  oo  , 


lim  =  0  . 


fc^°°  |Anjfe  -  A 

However,  this  contradicts  (5.3). 
Case(ii)  If 


lim 


k—*cx>  |  A 


=  OO 


nk  | 


(5.2) 


(5.3) 


then  there  exists  a  subsequence  of  {n*},  say  {m*,},  such  that 

lim  |A"‘  -  A-|  =  0  .  (5.4) 

k^°°  \xmk  -  X,\ 


From  (5.4)  and  the  assumption  that  |A*,+i  —  A*|  =  0(  | A*  —  A,|)  we  have 

i-  I  Am*  +1  —  A 


lim  iAp=±i - =  o  . 

k~*°°  \xmk  -  x.| 


(5.5) 


Now,  since  {x^}  converges  ^-superlinearly  we  have  from  (5.5)  and  (3.18) 

|  Ax 


lim  1 


k— >-oo  |AA 


=  oo  . 


(5.6) 


mk  I 
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Theorem  3.1,  (5.6)  and  the  fact  that  we  are  assuming  that  {(rr^Ajt)}  con¬ 
verges  (/-superlinearly  imply 


lim  =  0  •  (5.7) 

fc-°°  |AxmJ 

Since  {IT}  defined  in  Lemma  5.1  converges  to  A *  and  {a:*,}  converges  q- 
superlinearly  (see  (3.13)) 


{Bmk  Tmjt)(xmfc  3:*) |  _ 

fc^°°  \Xmk  ~  ®*| 


(5.8) 


Let  wk+i  =  xfc+i  -  x,  -  Bk  1(Bk  -  Tk)(xk  -  x*)  and  Qk  =  —Bk  1Vhk.  Then 
from  Lemma  2.1  we  have 


Wk+i  —  Qfc(Afc+i  A,)  .  (5-9) 

Now, 

K+il  _  k*+i  -x„-  Bk\Bk  -  Tk)(xk  -  x»)| 

M  I Xk  ~  X,  -  Bk\(Bk.  1  -  r*_1)(o;*_1  -  a:*) I 

_  |Axfc  +  BklTk(xk  -  x»)| 

|Axfc_i  T  Bk_i r^_i (x/;_i  x»)| 

Since  { xk }  converges  to  x*  g-superlinearly  we  have 

lim  lU>m^+1l  _  \(^mk^rnk  ~  I){xmk  ~  #»)| 

fc^°°  K»J  -  I)(xmh-i  -  X»)| 

(5.10) 

=  lim  — — — -  . 

k—>00  —I)(xmk- 1 

37*  | 


From  (5.4)  and  Lemma  5.1  we  know  that  the  denominator  in  (5.10)  converges 
to  1.  It  follows  from  (5.8)  and  the  fact  that  {Bk1}  is  bounded  and  (5.10) 
that 


fc^°°  |U>mJ 


=  0  . 
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Since  { Bk }  is  bounded  and  Vh*  has  full  rank  the  limit  points  of  Qk  defined 
by  (5.9)  have  full  rank.  It  follows  from  Lemma  5.2  and  (5.9)  that 


lim 

k— MX) 


A, 


=  0  . 


This  contradicts  (5.3). 

Both  case  (i)  and  case  (ii)  lead  to  a  contradiction.  Hence  we  must  have 
8  =  0  and  it  follows  that 


lim 

k—*  oo 


l-^fc+i  —  A*  | 

|A*-A„| 


=  0, 


i.e.  case  (i)  of  the  theorem  holds.  The  last  statement  of  the  theorem  follows 
from  Theorem  4.1.  □ 

Let  us  end  this  section  by  collecting  all  our  results  and  stating  them  for 
the  popular  SQP  secant  methods. 


Theorem  5.2  Consider  the  SQP  Broyden,  PSB,  DFP  or  BFGS  secant  method. 
In  the  case  of  DFP  and  BFGS  assume  that  the  matrix  A *  is  positive  definite. 
Then  there  exist  positive  numbers  e  and  6  such  that  whenever  |x0  —  ar*|  <  e 
and  | Bq  A+ |  <  8  the  iteration  sequence  (a;*,,  A k)  is  well-defined  and  converges 
to  (x*,A„).  In  addition  we  have 

(i)  {(zfc,Afc)}  converges  to  (a:,,  A*)  q-superlinearly, 

(ii)  {a:*J  converges  to  x*  q-superlinearly, 

i-  A*+1  —  A* 

(m)  lim  — : - —  =  0, 

k-*oo  \Xk  —  a;*| 

(iv)  {A*}  converges  to  A,  q-superlinearly  or  q-sublinearly  with  un¬ 
bounded  qi  factor. 


Proof.  It  is  known  that  these  secant  methods  satisfy  bounded  deterioration 
and  condition  (3.5).  The  bounded  deterioration  implies  that  {Bk}  and  { Bf1} 
are  bounded.  For  details  see  Theorem  3.1,  Proposition  4.2  and  Corollary  5.5 
of  Fontecilla,  Steihaug  and  Tapia  (1987).  The  theorem  now  follows  from 
Theorem  3.1,  Theorem  3.3,  and  Theorem  5.1.  □ 
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6  Summary  and  Concluding  Remarks 

In  Section  3  we  exhibited  three  important  characterization  theorems  and 
demonstrated  that  collectively  they  offer  a  very  powerful  analytic  tool  for 
studying  the  convergence  properties  of  SQP  methods.  We  derived  several 
conditions  which  imply  g-superlinear  convergence  in  (a:,  A),  x  and  A  for  an 
SQP  method.  We  also  noted  that  in  essentially  all  applications,  the  Boggs- 
Tolle-Wang  condition  for  g-superlinear  convergence  in  x  has  been  established 
by  first  showing  that  condition  (3.5)  holds.  Moreover,  some  of  the  theory  pre¬ 
sented  in  this  paper  implies  that  condition  (3.5)  should  be  expected  to  hold. 
Consequently,  we  have  argued  that  the  Fontecilla-Steihaug-Tapia  theorem 
(Theorem  3.3)  which  offers  a  characterization  of  condition  (3.5)  is  proba¬ 
bly  a  more  useful  tool  than  the  Boggs-Tolle-Wang  characterization  theorem 
(Theorem  3.2). 

In  general  if  we  have  ^-superlinear  convergence  in  the  pair  (a:,  A)  we  only 
have  r-superlinear  convergence  in  the  variable  x  and  the  multiplier  A.  How¬ 
ever,  Theorem  4.1  shows  that  if  {Bk}  and  {B^1}  are  bounded,  then  for  an 
SQP  method  we  always  have  at  least  two-step  g-superlinear  convergence  in 
the  variable  x  whenever  we  have  <7- superb  near  convergence  in  the  pair  (x,  A). 

In  Section  5  we  showed  that  the  convergence  for  the  multiplier  A  was 
either  ^-superlinear  or  9-sublinear  with  unbounded  qx  factor  whenever  the 
convergence  for  the  pair  (2;,  A)  and  the  variable  x  were  </-superlinear.  We 
consider  this  theorem  to  be  the  main  contribution  of  the  paper.  Initially  we 
found  this  result  to  be  somewhat  of  a  surprise.  Indeed,  authors  have  assumed 
that  this  convergence  is  g-superlinear  or  at  least  ^-linear.  However,  after 
studying  the  mechanics  of  the  SQP  method  we  have  convinced  ourselves  that 
this  result  should  have  been  expected.  Let  us  now  present  some  discussion 
along  this  line. 

A  highly  desirable  feature  of  an  iterative  procedure  is  the  property  that 
should  an  iterate  happen  to  coincide  with  a  solution,  then  the  subsequent 
iterate  is  also  equal  to  the  solution.  Clearly,  an  iterative  procedure  which 
lacks  this  fundamental  property  cannot  have  good  theoretical  ^-convergence 
behavior.  The  error  could  be  zero  at  one  iteration  and  nonzero  in  the  subse¬ 
quent  iteration.  This  implies  that  in  any  analysis  which  considers  the  worst 
case,  the  q\ -factor  would  be  unbounded.  Even  if  the  error  were  not  zero 
at  any  iteration  it  could  be  arbitrarily  small  and  one  would  expect  similar 
statements  to  hold. 
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Let  us  now  look  at  the  SQP  iterative  procedure  in  terms  of  (x,  A),  x  and 
A  from  this  point  of  view.  It  follows  that  if  xk  =  x*,  then  V/fc  =  -Vhk A, 
and  hk  =  0;  so  from  (2.1)  A^+i  =  A*  and  from  (2.2)  xk+i  =  x*.  Therefore 
the  establishment  of  good  ^-convergence  behavior  in  (x,  A)  and  in  x  for  the 
SQP  method  should  not  be  viewed  as  a  complete  surprise. 

From  (2.1)  we  see  that  A^+i  does  not  depend  explicitly  on  A*.  We  should 
not  expect  to  have  A*.+i  =  A»  whenever  A*  =  A».  Moreover,  in  most  cases 
there  will  exist  a  manifold  fl  €  IRn  of  dimension  n  —  m  such  that  Xk+i  =  A, 
whenever  xk  £  0.  It  follows  that  in  the  worst-case  analysis  given  by  Theo¬ 
rem  5.1  the  unbounded  91-factor  situation  is  to  be  expected  and  cannot  be 
removed  from  the  theorem.  The  surprise  is  that  Theorem  5.1  says  that  if 
the  ^-convergence  behavior  in  A  is  not  arbitrarily  bad  (unbounded  q\ -factor), 
then  it  is  essentially  optimal  (qv -factor  of  zero).  It  is  interesting  that  both 
notions  are  norm  independent.  We  believe  that  while  our  numerical  experi¬ 
ence  dictates  that  in  most  cases  we  should  expect  9-superlinear  convergence 
in  A,  Theorem  5.1  is  actually  sharp.  This  means  that  while  the  9-superlinear 
convergence  of  {Afc}  is  likely  in  any  given  computation,  assumptions  which 
imply  it  are  mathematically  restrictive. 

It  is  interesting  to  point  out  that  the  r-convergence  in  A  is  always  su- 
perlinear  and  the  unbounded  91-factor  occurs  because  the  estimate  of  the 
multiplier  is  exceptionally  good  an  infinite  number  of  times. 

It  is  also  interesting  to  point  out  that  in  the  modified  SQP  method  studied 
by  Tapia  and  Whitley  (1988)  if  it  happens  that  A*  =  A„,  then  the  algorithm 
will  converge  in  the  subsequent  iteration,  i.e.,  (xk+1,\k+1)  =  (**,  A*).  This 
is  due  to  the  very  special  structure  of  the  eigenvalue  problem.  Hence  it 
is  not  unreasonable  that  they  were  able  to  establish  the  same  surprising  q- 
convergence  rate  of  1-f  \/2  for  the  pair  (x,  A),  the  variable  x  and  the  multiplier 


Theorem  5.2  is  an  up-to-date  account  of  the  convergence  properties  of 
the  SQP  Broyden,  PSB,  DFP  and  BFGS  secant  methods.  Since  these  meth¬ 
ods  satisfy  condition  (3.5),  from  Proposition  3.3  we  should  expect  them  to 
give  9-superlinear  convergence  in  (x,  A),  x  and  A.  Theorem  5.2  says  that  in 
the  unusual  case  that  we  do  not  have  9-superlinear  convergence  in  A,  i.e. 


lim 


|Ajt+i  —  A, 


we  will  still  have  lim 

k—HXl 


l^fc+1 


1,  .  1  —  00,  we  will  still  have  lim  ^-r — — - —  =  0.  Hence,  even 

k~* 00  |Afc  — A„|  fc- 00  |xfc-x*| 

though  it  is  possible  that  the  A-sequence  may  exhibit  bad  9-behavior  its  con¬ 
vergence  will  be  extremely  fast.  We  must  conclude  that  9-convergence  is  an 
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inappropriate  and  pessimistic  measure  of  convergence  for  the  A-sequence. 
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